crowd counting
Incorporating Side Information by Adaptive Convolution
Computer vision tasks often have side information available that is helpful to solve the task. For example, for crowd counting, the camera perspective (e.g., camera angle and height) gives a clue about the appearance and scale of people in the scene. While side information has been shown to be useful for counting systems using traditional hand-crafted features, it has not been fully utilized in counting systems based on deep learning. In order to incorporate the available side information, we propose an adaptive convolutional neural network (ACNN), where the convolution filter weights adapt to the current scene context via the side information.
Incorporating Side Information by Adaptive Convolution
Computer vision tasks often have side information available that is helpful to solve the task. For example, for crowd counting, the camera perspective (e.g., camera angle and height) gives a clue about the appearance and scale of people in the scene. While side information has been shown to be useful for counting systems using traditional hand-crafted features, it has not been fully utilized in counting systems based on deep learning. In order to incorporate the available side information, we propose an adaptive convolutional neural network (ACNN), where the convolution filter weights adapt to the current scene context via the side information.
Distribution Matching for Crowd Counting Supplementary Material
DM-Count and investigate the robustness of different methods to noisy annotations. Assume for all x D and g G we have |g ( x) | B . We propose the following five lemmas which are essential for proving the proposed theorems. Lemmas A, B, C and D give the Lipschitz constants of different loss functions. Consider the dual form of Eq. (15) W ( µ, ν) = max α The first inequality in Eq. (20) is achieved because The second equality in Eq. (20) is achieved because We restate Theorem 1 in the main paper below.
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- North America > Canada (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
Count2Density: Crowd Density Estimation without Location-level Annotations
Litrico, Mattia, Chen, Feng, Pound, Michael, Tsaftaris, Sotirios A, Battiato, Sebastiano, Giuffrida, Mario Valerio
Crowd density estimation is a well-known computer vision task aimed at estimating the density distribution of people in an image. The main challenge in this domain is the reliance on fine-grained location-level annotations, (i.e. points placed on top of each individual) to train deep networks. Collecting such detailed annotations is both tedious, time-consuming, and poses a significant barrier to scalability for real-world applications. To alleviate this burden, we present Count2Density: a novel pipeline designed to predict meaningful density maps containing quantitative spatial information using only count-level annotations (i.e., total number of people) during training. To achieve this, Count2Density generates pseudo-density maps leveraging past predictions stored in a Historical Map Bank, thereby reducing confirmation bias. This bank is initialised using an unsupervised saliency estimator to provide an initial spatial prior and is iteratively updated with an EMA of predicted density maps. These pseudo-density maps are obtained by sampling locations from estimated crowd areas using a hypergeometric distribution, with the number of samplings determined by the count-level annotations. To further enhance the spatial awareness of the model, we add a self-supervised contrastive spatial regulariser to encourage similar feature representations within crowded regions while maximising dissimilarity with background regions. Experimental results demonstrate that our approach significantly outperforms cross-domain adaptation methods and achieves better results than recent state-of-the-art approaches in semi-supervised settings across several datasets. Additional analyses validate the effectiveness of each individual component of our pipeline, confirming the ability of Count2Density to effectively retrieve spatial information from count-level annotations and enabling accurate subregion counting.
- Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)
- North America > United States (0.04)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- (3 more...)
- Research Report > New Finding (0.34)
- Research Report > Promising Solution (0.34)
Crowd Scene Analysis using Deep Learning Techniques
With the recent advancement in the field of deep learning and computer vision, crowd scene analysis has gained significant attention. UN predicts world population growth of 0.82% by 2035, driving people to cities for better lifestyles and social events like concerts, shopping, political gatherings, and educational conferences. Crowd scene analysis is crucial for ensuring a safe environment in public spaces, but manual monitoring can be laborious due to the risk of missing important information. An automatic solution is needed for efficient real-life applications. Our research is focused on two main applications of crowd scene analysis: crowd counting, and anomaly detection.
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
- (6 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Instructional Material (1.00)
- Research Report > Promising Solution (0.92)
A Transformer-based Multimodal Fusion Model for Efficient Crowd Counting Using Visual and Wireless Signals
Cui, Zhe, Li, Yuli, Tran, Le-Nam
--Current crowd-counting models often rely on single-modal inputs, such as visual images or wireless signal data, which can result in significant information loss and suboptimal recognition performance. T o address these shortcomings, we propose TransFusion, a novel multimodal fusion-based crowd-counting model that integrates Channel State Information (CSI) with image data. By leveraging the powerful capabilities of Transformer networks, TransFusion effectively combines these two distinct data modalities, enabling the capture of comprehensive global contextual information that is critical for accurate crowd estimation. However, while transformers are well capable of capturing global features, they potentially fail to identify finer-grained, local details essential for precise crowd counting. T o mitigate this, we incorporate Convolutional Neural Networks (CNNs) into the model architecture, enhancing its ability to extract detailed local features that complement the global context provided by the Transformer . Extensive experimental evaluations demonstrate that TransFusion achieves high accuracy with minimal counting errors while maintaining superior efficiency.
Taste More, Taste Better: Diverse Data and Strong Model Boost Semi-Supervised Crowd Counting
Yang, Maochen, Li, Zekun, Zhang, Jian, Qi, Lei, Shi, Yinghuan
Semi-supervised crowd counting is crucial for addressing the high annotation costs of densely populated scenes. Although several methods based on pseudo-labeling have been proposed, it remains challenging to effectively and accurately utilize unlabeled data. In this paper, we propose a novel framework called Taste More Taste Better (TMTB), which emphasizes both data and model aspects. Firstly, we explore a data augmentation technique well-suited for the crowd counting task. By inpainting the background regions, this technique can effectively enhance data diversity while preserving the fidelity of the entire scenes. Secondly, we introduce the Visual State Space Model as backbone to capture the global context information from crowd scenes, which is crucial for extremely crowded, low-light, and adverse weather scenarios. In addition to the traditional regression head for exact prediction, we employ an Anti-Noise classification head to provide less exact but more accurate supervision, since the regression head is sensitive to noise in manual annotations. We conduct extensive experiments on four benchmark datasets and show that our method outperforms state-of-the-art methods by a large margin. Code is publicly available on https://github.com/syhien/taste_more_taste_better.
- North America > United States > New York > New York County > New York City (0.05)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (2 more...)